arXiv:1502.05375vl [cs.DS] 18 Feb 2015 


On learning /c-parities with and without noise 

Arnab Bhattacharyya* Ameet Gadekar'^ Ninad Rajgopab 

Department of Computer Science & Automation 
Indian Institute of Science 

February 19, 2015 


Abstract 

We first consider the problem of learning fc-parities in the on-line mistake-bound model: given a 
hidden vector x € { 0 , 1 }" with |x| = k and a sequence of “questions” ai, 02 ,--- € { 0 , 1 }", where 
the algorithm must reply to each question with {ai,x) (mod 2 ), what is the best tradeoff between the 
number of mistakes made by the algorithm and its time complexity? We improve the previous best result 
of Buhrman et. al. [BGMIO] by an exp(fe) factor in the time complexity. 

Second, we consider the problem of learning fc-parities in the presence of classification noise of rate 
rj € ( 0 , 1 / 2 ). A polynomial time algorithm for this problem (when r/ > 0 and k = w(l)) is a long¬ 
standing challenge in learning theory. Grigorescu et al. [GRVll] showed an algorithm running in time 

+°B)^ Note that this algorithm inherently requires time (^," 2 ) even when the noise rate rj is poly- 
nomially small. We observe that for sufficiently small noise rate, it is possible to break the (^," 2 ) barrier. 
In particular, if for some function f{n) = cu(l) and a £ [1/2, 1 ), k = n/f{n) and rj = o(/(n)““/logn), 
then there is an algorithm for the problem with running time poly(n) • (p^ . 


1 Introduction 


By now, the “Parity Problem” of Blum, Kalai and Wasserman [BKW03] has acquired widespread notoriety. 
The question is simple enough to be in our second sentence: in order to learn a hidden vector x £ {0,1}", 
what is the least number of random examples (a, that need to be seen, where a is uniformly chosen 
from {0,1}" and £ = (mod 2) with probability at least 1 — 77 ? Information-theoretically, x can be 

recovered only after 0{n) examples, even if the noise rate 77 is close to 1/2. But if we add the additional 
constraint that the running time of the learning algorithm be minimized, the barely subexponential running 
time of [BKW03]’s algorithm, still holds the record of being the fastest known for this problem! 

Learning parities with noise is a central problem in theoretical computer science. It has incarnations 
in several different areas of computer science, including coding theory as the problem of learning random 
binary linear codes and cryptography as the “learning with errors” problem that underlies lattice-based 
cryptosystems [Reg09, BVll]. In learning theory, the special case of the problem where the hidden vector 
X is known to be supported on a set of size k much smaller than n has great relevance. We refer to this 
problem as learning k-parity with noise or fc-LPN. Feldman et al. [FGKP09] showed that learning Gjuntas, 
as well as learning 2*^-term DNFs from uniformly random examples and variants of these problems in which 
the noise is adversarial instead of random, all reduce to the fc-LPN problem. For the fc-LPN problem, the 
current record is that of Grigorescu, Reyzin and Vempala [GRVll] who showed a learning algorithm that 

succeeds with constant probability, takes (fe/ 2 ) samples. When the 
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noise rate rj is close to 1/2, this running time is improved by an algorithm due to G. Valiant [Vail 2] that 
runs in time • poly( ’"^ide open challenge to find a polynomial time algorithm for fc-LPN 

for growing k or to prove a negative result. 

Another outstanding challenge in machine learning is the problem of learning parities without noise in 
the “attribute-efficient” setting [Blu96]. The algorithm is given access to a source of examples {a,£) where a 
is chosen uniformly from {0,1}" and £ = aiXi (mod 2) with no noise, and the question is to learn x while 
simultaneously reducing the time complexity and the number of examples drawn by the algorithm. Again, 
we focus on the case where x has sparsity k <^n. Information-theoretically, of course, O(fclogn) examples 
should be sufficient, as each linearly independent example reduces the number of consistent fc-parities by a 
factor of 2. But the fastest known algorithm making 0{k\ogn) samples runs in time [KS06], and it 

is open whether there exists a polynomial time algorithm for learning parities that is attribute-efficient, i.e. it 
makes poly(A: log n) samples. Buhrman, Garda-Soriano and Matsliah [BGMIO] give the current best tradeoffs 
between the sample complexity and running time for learning parities in this noiseless setting. Notice that 
with 0(n) samples, it is easy to learn the fc-parity in polynomial time using Gaussian elimination. 

1.1 Our Results 

We first study the noiseless setting. Our main technical result is an improved tradeoff between the sample 
complexity and runtime for learning parities. 

Theorem 1.1. Let k,t : N ^ N be two functions^ satisfying log log n <IC k{n) t{n) <C n. For any (5 > 0, 
there is an algorithm that learns the concept class of k-parities on n variables with confidence parameter 6, 
using 0{kn/t log (^) -I- log(l/5)) uniformly random examples and • poly(n) • log(l/(5) running 

timef. 

Actually, we prove our result in the mistake-bound model [Lit89] that is stronger than the PAG model 
discussed above (in fact, strictly stronger assuming the existence of one-way functions [Blu94]). As a con¬ 
sequence, a theorem of the above form also holds when the examples come from an arbitrary distribution. 
We defer the statement of the result for the mistake-bound model to Section 3. 

For comparison, let us quote the closely related result of Buhrman et. ah: 

Theorem 1.2 (Theorem 2.1 of [BGMIO]). Let fc,t : N —>■ N be two functions satisfying k{n) ^ t(n) < n. 
For any i5 > 0, there is an algorithm that learns the concept class of k-parities on n variables with confidenee 
parameters, using 0{kn/t-\-\og (^)-|-log(l/(5)) uniformly random examples and (^) •poly(n)dog(l/(5) running 
time. 

Thus, in the comparable regime, our Theorem 1.1 improves the runtime complexity of Theorem 3.2 
by an exp(fc) factor while its sample complexity remains the same upto constant factors. Note that as t 
approaches k, our algorithm makes 0{n) samples and takes poly(n) time which is the complexity of the 
Gaussian elimination approach. On the other hand, if t = n/log(n/fc), our algorithm makes 0{k\og{n/k)) 
samples and takes^ exp(—fc) • time (ignoring polynomial factors), compared to the trivial approach 

which explicitly keeps track of the subset of all the fc-weight parities consistent with examples given so far 
and which makes 0(k\og(n/k)) samples and takes 0(()()) time. 

We next examine the noisy setting. Here, our contribution is a simple, general observation that does not 
seem to have been explicitly made before. 

Theorem 1.3. Given an algorithm A that learns PAR/fc/ over the uniform distribution with confidence 
parameter S using s{S) samples and running time t{5),there is an algorithm A! that solves the fc-LPN 
problem with noise rate rj G ( 0 , 1 / 3 ), using 0{s(S/2)log{l/S)) examples and running time exp(0(iJ(377/2) • 
s{5/2) ■ log(l/(5)))) ■ {t{5/2) -\- s{5/2) log(l/5)) and with confidence parameter 5. 

lyVe assume throughout, as in [BGMIO], that k and t are constructible in quadratic time. 

^The “4.01” can be replaced by any constant more than 4. 

®By exp('), we mean 
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In the above, H : [0,1] ^ [0,1] denotes the binary entropy function H{p) = plog 2 ^ + (1 — p) log 2 
The main conceptual message carried by Theorem 1.3 is that improving the sample complexity for efficient 
learning of noiseless parity improves the running time for learning of noisy parity. For instance, if we use 
Spielman’s algorithm as A, reported in [KS06], that learns fc-parity using O(fclogn) samples and 0((j,"2)) 
running time, we immediately get the following: 

Corollary 1.4. For any rj G ( 0 , 1 / 3 ) and constant confidence parameter, there is an algorithm for fc-LPN 
with sample complexity O(fclogn) and running time 

For comparison, the current best result of [GRVll] has runtime (^," 2 ) sample complex¬ 

ity w(fclogn). In the regime under consideration, our algorithm’s runtime has a worse exponent but an 
asymptotically better sample complexity. 

The result of [GRVll] requires (^," 2 ) time regardless of how small rj is. We show via Theorem 1.3 and 
Theorem 1.1 that it is possible to break the (^," 2 ) barrier when 77 is a small enough function of n. 

Corollary 1.5. Suppose k(n) = n/f(ri) for some function / : N —>■ N for which f{n) <?C n/loglogn, and 
suppose r]{n) = o{ ((y(„))i logra) ) some a € [1/2,1). Then, for constant confidence parameter, there exists 

an algorithm for /c-LPN with noise rate rj with running time e-fe/4 0 i-i-o(fc) . ^n^i “ . poly(n) and sample 
complexity 0{k{f{n))°‘). 

We note that because of the results of Feldman et. al. [FGKP09], the above results for /c-LPN also extend 
to the setting where the example source adversarially mislabels examples instead of randomly but with the 
same rate 77 . 

1.2 Our Techniques 

We first give an algorithm to learn parities in the noiseless setting in the mistake bound model. We use the 
same approach as that of [BGMIO] (which was itself inspired by [APY09]). The idea is to consider a family 
S of subsets of {0,1}" such that the hidden fc-sparse vector is contained inside one of the elements of S. We 
maintain this invariant throughout the algorithm. Now, each time an example comes, it specifies a halfspace 
H of {0,1}" inside which the hidden vector is lying. So, we can update S by taking the intersection of each 
of its elements with H . If we can ensure that the set of points covered by the elements of S is decreasing by 
a constant factor at every round, then after 0(log^gg^ IS”]) examples, the hidden vector is learned. The 
runtime is determined by the number of sets in S times the cost of taking the intersection of each set with 
a halfspace. 

One can think of the argument of Buhrman et al. [BGMIO] as essentially initializing S to be the set of 
all (^) subspaces spanned by k standard basis vectors. The intersections of these subspaces with a halfspace 
can be computed efficiently by Gaussian elimination. Our idea is to reduce the number of sets in <S. Note 
that we can afford to make the size of each set in S larger by some factor C because this only increases the 
sample complexity by an additive log C. Our approach is (essentially) to take 5 to be a random collection 
of subspaces spanned by ak standard basis vectors, where a > 1 is a sufficiently large constant. We show 
that it is sufficient for the size of S to be smaller than ("{“) by a factor that is exponential in k, so that 
the running time is also improved by the same factor. Moreover, the sample complexity increases by only a 
lower-order additive term. 

Our second main contribution is a reduction from noiseless parity learning to noisy parity learning. The 
algorithm is a simple exhaustive search which guesses the location of the mislabelings, corrects those labels, 
applies the learner for noiseless parity and then verifies whether the output hypothesis matches the examples 
by drawing a few more samples. Surprisingly, this seemingly immediate algorithm allows us to devise the 
first algorithm which has a better running time than in the presence of a non-trivial amount of noise. 
The lesson seems to be that if we hope to beat constant noise rates, we should first address the 

open question of Blum [Blu96] of devising an attribute-efficient algorithm to learn parity without noise. 
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2 Preliminaries 


Let PAR(fc) be the class of all / G {0,1}" of Hamming weight k. So, |PAR(A:)| = (^). With each vector 
/ G PAR(A:), we associate a parity function / : (0,1}” ^ {0,1} defined by /(a) = (mod 2). 

Let C be a concept class of Boolean functions on n variables, such as PAR(fc). We discuss two models 
of learning in this work. One is Littlestone’s online mistake bound model [Lit89]. Here, learning proceeds 
in a series of rounds, where in each round, the learner is given an unlabeled boolean example a G (0,1}" 
and must predict the value /(a) of an unknown target function f G C. Once the learner predicts the value 
of /(a), the true value of /(a) is revealed to the learner by the teacher. The mistake bound of a learning 
algorithm is the worst-case number of mistakes that the algorithm makes over all sequences of examples and 
all possible target functions / G C. 

The second model of learning we consider is Valiant’s famous PAC model [Val84] of learning from random 
examples. Here, for an unknown target function f G C, the learner has access to a source of examples (a, /(a)) 
where a is chosen independently from a distribution V on {0,1}". A learning algorithm is said to PAC-learn 
C with sample complexity s, running time t, approximation parameter e and confidence parameter S if for 
all distributions V and all target functions f G C, the algorithm draws at most s samples from the example 
source, runs for time at most t and outputs a function /* such that, with a probability at least 1 — 5: 


Pr [/(a) ^ f*{a)] < e 


Often in this paper (e.g., all of the Introduction), we consider PAC-learning over the uniform distribution, 
in which case V is fixed to be uniform on (0,1}". Notice that for learning PAR(fc) over the uniform 
distribution, we can take £ = \ because any two distinct parities differ on half of { 0 , 1 }". 

There are standard conversion techniques which can be used to transform any mistake-bound algorithm 
into a PAC learning algorithm (over arbitrary distributions): 

Theorem 2.1 ([Ang 88 , Hau 88 , Lit89]). Any algorithm A that learns a concept class C in the mistake-bound 
model with mistake bound m and running time t per round can be converted into an algorithm A! that PAC- 
learns C with sample complexity 0{^m i log j-), running time 0{^mt + ^ log j-), approximation parameter 
£, and confidence parameter S. 


The fc-LPN problem with noise rate 77 , introduced in Section 1, corresponds to the problem of PAC- 
learning PAR(fc) under the uniform distribution, when the example source can mislabel examples with a 
rate rj G ( 0 , 1 / 2 ). More generally, one can study the /c-LPN problem over T), an arbitrary distribution. 
[GRVll] show the following for this problem: 


Theorem 2.2 (Theorem 5 of [GRVll]). For any e,5 ,77 G ( 0 , 1 / 2 ), and distribution V over {0,1}”, the 
A:-LPN problem over T) with noise rate 77 can be solved using ^ samples in time ' 

(fe/ 2 ) '+’>- 2 ='' , where e and 6 are the approximation and confidence parameters respectively. 


3 In the absence of noise 


We state the main result of this section. 

Theorem 3.1. Let : N —> N be two functions such that log log ti k{n) <C t{n) <C n. Then for every 
77 G N, there is an algorithm that learns PAR/fc/ in the mistake-bound model, with mistake bound at most 

(1 -I- 0 ( 1 ))^ -I- log (1) and running time per round ■ (1) ■ O [{kn/fif'^ . 

Using Theorem 1.3, we directly obtain Theorem 1.1. In fact, since Theorem 1.3 produces a PAC- 
learner over any distribution, a statement of the form of Theorem 1.1 holds for examples obtained from any 
distribution. 

For comparison, we quote the relevant result of [BGMIO] in the mistake-bound model. 
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Theorem 3.2 (Theorem 2.1 of [BGMIO]). Let : N —?• N be two functions such that k{n) ^ t{n) ^ n. 
Then for every n G N, there is a deterministic algorithm that learns PAR(7cJ in the mistake-bound model, 
with mistake bound at most k\j~\ + log (Jf) and running time per round • 0{{kn/t) ). 

Note that their mistake bound is better by a lower-order term which we do not see how to avoid in our 
setup. This slack is not enough though to recover Theorem 3.1 from Theorem 3.2: dividing t by C roughly 
multiplies the sample complexity by C and divides the running time by in [BGMIO] ’s algorithm, whereas 
in our algorithm, dividing t by C roughly multiplies the sample complexity by C and divides the running 
time by (1.28C')'=. 


3.1 The Algorithm 

Let / e {0,1}" be the hidden vector of sparsity k that the learning algorithm is trying to learn. Let 
e = (ei, 62 , • • • , e„} be the set of standard basis of the vector space { 0 , 1 }". 

Let a be a large constant we set later, and let T = at. Note that T <^n. We define an arbitrary partition 
TT = C\,C 2 , • ■ • , Ct on the set e into T parts, each of size at most \n/T'\. Next, let S\,..Sm C [T] be m 
random subsets of [T], each of size ak. We choose m to ensure the following: 


Claim 3.3. If m = O ^ ^ ), then with nonzero probability, for every set A C [T] of size k, A G Si for 

some i € [ml. 


Proof. This follows from the simple observation that for any fixed i € [m], Pr[Ac5.] = (:-';)/Q,and 
so. 


Pr[3i&[m],A(^Si]= 1 - 


T-k 
ak — k 


T 

ak 


^ e 


—(i-y/d) 


( T \ 

Choosing m = 2 , \ log (j.) and applying the union bound finishes the proof. 

Kak-k) 


□ 


We fix some choice of Si,..., Sm C [T] that satisfies the conclusion of Claim 3.3 for what follows, 
fact, the rest is exactly [BGMIO] ’s algorithm, which we reproduce for completeness. 


For every i G [m], let Mi C {0,1}" be the span of [J 


iGSi ^ 3 - 


Note that 


U.GS.C-, 


^ ak\n/T'] 


In 


afc • (^ -I- l) < ^ ak = (1 -I- o{l))kn/t, as t <C n and a is a constant. So, Mi is a linear subspace 
containing at most points. 


Note that every / G {0,1}" with |/| = k is contained in some Mi. This is simply because every set of k 
standard basis vectors is contained in at most k of the T parts in the partition tt, and by Claim 3.3, every 
subset of [T] of size k is contained in some Si. 

Initially, the unknown target vector / can be in any of the Mfs. Consider what happens when the learner 
sees an example a G {0,1}" and a label y G {0,1}. For i G [m], let Mi{a,y) = {/ G : /(a) = y}. Mi{a,y) 
may be of size 0, \Mi\ or \Mi\/2. Note that the size of Mi{a,y) can be efficiently found using Gaussian 
elimination. 

We are now ready to describe the algorithm: 


- Initialization: The learning algorithm begins with a set of affine spaces Ni,i G [m] represented by a 
system of linear equations. Initialize the affine spaces Ni = Mi for all i G [mj. 

- On receiving an example a G {0,1}": Predict its label y G {0,1} such that ^ 

E*G[m] -y)l- 

- On receiving the answer from the teacher y = f{a): Update Ni to Ni{a,y) for each i G [m]. 
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3.2 Analysis 

Before we analyze the algorithm, we first establish a combinatorial claim that is the crux of our improvement: 


Lemma 3.4. If a is a large enough constant, 

CJ 


Xotk—kJ 




Proof. 


± _Q_ 

( 1 ) 


fc-i 



k — i T — i 



ak — i t — i 

hi-b (A-A) 


1 - 


1 - 


1 + ^ (1 - i) 

0.999 \ 


1 J- Q: I 

2 a-1 / 

where the equalities are routine calculation and the inequality is using that k(n) <g; t(n). Each individual 
term in the product is strictly less than 1. So, the above is bounded by: 



1 - 


0.999 


1 + 


^ 1 - 


0.999 


< exp - Ig e 


0.999(1 -6:) 


(2 - e)(l + (1 - 

for a small enough constant e > 0 and large enough constant a > 1. 




□ 


Proof of Theorem 3.1. Fix a to be a constant that makes the conclusion of Lemma 3.4 true. 

We first check that the invariant is maintained throughout the algorithm that / G UjgjmjW- This holds 
at initiation by the argument given earlier. After that, obviously, if / G A, then / G Ni{a, f{a)) for any 
a G {0,1}”, and so the invariant holds. Therefore, if the algorithm terminates, it will find the hidden vector 
/ and return it as the solution. The rate of convergence is precisely captured by the number of mistakes 
learning algorithm makes, which we describe next. 

Mistake Bound. Notice that when the algorithm begins, the sum of the sizes of all the affine spaces, 

^ 6^ ^ ^ ^ Now whenever the learner makes a mistake by predicting y ^ y, the size 

of all affine spaces \Ni\ reduces by a factor of at least 2. This is due to the definition of y and the fact 
that |W(a,y)| + \Ni{a, l-y)\ = |W|- 

Hence, using Lemma 3.4, after at most 


log E 1^*1 ^ 


0l (gfc) \ o(l+o(l))fcn/t 
(afc—fc) ^ 


^ (1 + o{l))kn/t + log ( ^ j - H(fc) + logO ^log ^ 


6 

















mistakes, the size of UsgsiVs will decrease to 1, which by the invariant above will imply that UggsiVs = {/}, 
and hence the learner makes no more mistakes. . Since we assume k 3> log log n and t <C n, we can bound 
the number of mistakes by: (1 + o{l))kn/t + log (^) 

Running Time. We analyze the running time of the learner for each round. At each round, for a question 
a S {0,1}", we need to compute |W(a, 0)| and |W(a, 1)| as well as store a representation of the updated 
Ni. Now, since for each Ni is spanned by at most £ = (1 + o{l))kn/t basis vectors, we can treat each Ni as 
a linear subspace in {0,1}^. Ni{a, 0) and Ni{a, 1) can be computed by performing Gaussian elimination on 
a system of linear equations involving i variables, which takes 0{£^) time. Thus, the total running time is 
0{m£^), which using Lemma 3.4 is exactly the bound claimed in Theorem 3.1. □ 


4 In the presence of noise 

Recall the A:-LPN problem. In this section, we show a reduction from fc-LPN to noiseless learning of PAR(fc) 
and its applications. 

4.1 The Reduction 

We focus on the case when the noise rate rj is bounded by a constant less than half. 

Theorem 1.3 (recalled) Given an algorithm A that learns PAR(7c) over the uniform distribution with con¬ 
fidence parameter 6 using s{6) samples and running time t{S),there is an algorithm A! that solves the fc-LPN 
problem with noise rate rj G (0, Ys), using 0{s{6/2)log{l/S)) examples and running time exp(0(il(3?7/2) ■ 
s{5/2) ■ log(l/(5)))) ■ {t{5/2) + s{5/2) log(l/5)) and with confidence parameter 5. 

Let A(^) be a PAC-learning algorithm over the uniform distribution for PAR(/c) of length n with confi¬ 
dence parameter <5 that draws s{5) examples and runs in time t{5). Below is our algorithm for /c-LPN. Here, 
H denotes the binary entropy function p >->• plog 2 (l/_p) -I- (1 — p) log 2 (l/(l —p))- 


Noisy( 5, ry) 

1: Draw s' = 20s(i5/2) log(l/(5) random examples {ai,£i), • ■ •, (««'Ys') G (0,1}" x {0,1}. 

2: for all S C [s'], j^l ^ |? 7 s' do 
3: for i £ [s'] do 

4: if i G S' then Y ^ 1 — Y 

5: else £i ■(— £i 

6: end if 

7; end for 

8: xs ■‘r- A{5I2) applied to examples (oi, £i),..., {as'As')- 

9: end for 

10: Draw s" = 600(s' ■ H{3r]/2) -b log(8/5)) random examples (6i, mi),..., {bsii,ms») G (0,1}” x {0,1} 
11: S* ^ argmaxsc[s'].|S|s:3esV2 |{* G [s"] : {b^,xs) = m^}\ 

12: return xs* 


Proof of Theorem 1.3. 

Lemma 4.1. The sample complexity of Noisy is s' -b s" = 0(s(i5/2) log(l/(5)). 

Proof. Immediate. □ 

Lemma 4.2. The running time of Noisy is • (t{S/2) -b s(5/2) log(l/(I))). 
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Proof. We use the standard estimate X]r=o (i) ^ for a ^ i. The bound is then immediate. □ 

Lemma 4.3. If x is the hidden vector and x* is output hy Noisy(5), then with probability at least 1 — 

X = X. 

Proof. We make the assumption throughout that rj = Il{l/s{S/2)), as otherwise, with high probability, the 
example source won’t mislabel any of s(5/2) samples and so the reduction is trivial. 

Let r = {* G [s'] : (oi, x) ^ If) be the subset of the s' samples drawn in line 1 that are mislabeled by the 
example source. By the Chernoff bound: Pr[|T| > 3?7s'/2] ^ ^ (5/4. If |T| ^ 3?7s'/2, we have with 

probability at least 1 — (5/2, xt = x. Thus, for any i G [s"],Prf,J(a;T,&i) nii] < p. On the other hand, for 
all xs ^ XT, Prf,J(xs,6i) ^ {x,bi)] = 1/2, and so Prb.[{xs,bi) ^ nii] = 1/2 as the noise is random. Again, 
using Chernoff bounds, 

Pr[3,5 ^ T s.t.|{5 G [s"] : {h,xs) # m,}| ^ 5s"/12] < 2^(3r,/2)Y . ^-s"/450 ^ ^ 

O 

On the other hand, for xt itself, Pr[|{z G [s"] : (bi,XT) ^ nm}] > 5s"/12] < | by a similar use of Chernoff 
bounds. So, in all, with probability at least 1 — S, xt will be returned in step 12. □ 

□ 

When the noise rate rj is more than 1/3, a similar reduction can be given by adjusting the parameters 
accordingly. Also, when the distribution is arbitrary but the noise rate is less than 1/4, a similar reduction 
can be made to work. In the latter case, A is invoked with a smaller approximation parameter than the one 
given to Noisy so that the filtering step in line 10 works. 

4.2 Applications 

An immediate application of Theorem 1.3 is obtained by letting A be the current fastest known attribute- 
efficient algorithm for learning PAR(A:), the algorithm due to Spielman'^ [KS06] that makes 0{k logn) samples 
and takes 0((^"2)) time (for constant confidence parameter 6). (We ignore the confidence parameter in this 
section for simplicity.) 

Corollary 1.4 (recalled) For any rj G (0, 1 / 3 ) and constant confidence parameter, there is an algorithm for 
fc-LPN with sample complexity 0{k\ogn) and running time 

Proof. Immediate from Theorem 1.3. □ 

Our next application of Theorem 1.1 uses our improved PAR(fc) learning algorithm from Section 3. 

Corollary 1.5 (recalled) Suppose k{n) = n/f{n) for some function / : N —N for which 1 ^ f{n) <C 
n/loglogn, and suppose rjiri) = o(l/((/(n))“ logn)) for some a G [1/2,1). Then, for constant confidence 
parameter, there exists an algorithm for fc-LPN with noise rate rj with running time e-*:/4.oi+o(fc) . 
poly(n) and sample complexity 0{k{f(n))°‘). 

Proof. Let A be the algorithm of Theorem 1.1 with t{n) = [n/(/(n))“]. The running time of A is ■ 

(^) • poly(n) and its sample complexity is 0{k ■ /(n))“). Now, applying Theorem 1.3, we see that since 

H{l.hrj) = o((/(n))““), the running time for NoiSY is only a 2°^^^ factor times the running time of A. This 
yields our desired result. 

□ 


^Though a similar algorithm was also proposed by Hopper and Blum [HBOl] 
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